Goto

Collaborating Authors

 black-box vi


Advances in Black-Box VI: Normalizing Flows, Importance Weighting, and Optimization

Neural Information Processing Systems

Recent research has seen several advances relevant to black-box VI, but the current state of automatic posterior inference is unclear. One such advance is the use of normalizing flows to define flexible posterior densities for deep latent variable models. Another direction is the integration of Monte-Carlo methods to serve two purposes; first, to obtain tighter variational objectives for optimization, and second, to define enriched variational families through sampling. However, both flows and variational Monte-Carlo methods remain relatively unexplored for black-box VI. Moreover, on a pragmatic front, there are several optimization considerations like step-size scheme, parameter initialization, and choice of gradient estimators, for which there are no clear guidance in the existing literature. In this paper, we postulate that black-box VI is best addressed through a careful combination of numerous algorithmic components. We evaluate components relating to optimization, flows, and Monte-Carlo methods on a benchmark of 30 models from the Stan model library.



Review for NeurIPS paper: Advances in Black-Box VI: Normalizing Flows, Importance Weighting, and Optimization

Neural Information Processing Systems

Weaknesses: Any empirical comparison is going to have the flaw of being insufficiently exhaustive, and this one is no exception. For example: - ADVI as implemented in Stan encompasses both full-covariance and diagonal Gaussian surrogates, but this paper evaluates only one of those, and it wasn't even clear which one until quite far in (line 297). This should be clarified earlier. Ideally it would be nice to see the relative performance of both Gaussian baselines (and perhaps other commonly-suggested schemes like a diagonal low rank covariance). Was RealNVP chosen because it supports sticking-the-landing? It would be useful to see a side-by-side comparison against a similar-size IAF without sticking-the-landing. - A simple method not included (maybe because it's so simple that no one has published on it for VI recently) is Polyak-Ruppert averaging, i.e., averaging the variational parameters over the final steps of stochastic optimization.


Advances in Black-Box VI: Normalizing Flows, Importance Weighting, and Optimization

Neural Information Processing Systems

Recent research has seen several advances relevant to black-box VI, but the current state of automatic posterior inference is unclear. One such advance is the use of normalizing flows to define flexible posterior densities for deep latent variable models. Another direction is the integration of Monte-Carlo methods to serve two purposes; first, to obtain tighter variational objectives for optimization, and second, to define enriched variational families through sampling. However, both flows and variational Monte-Carlo methods remain relatively unexplored for black-box VI. Moreover, on a pragmatic front, there are several optimization considerations like step-size scheme, parameter initialization, and choice of gradient estimators, for which there are no clear guidance in the existing literature.


Reviews: Fast Black-box Variational Inference through Stochastic Trust-Region Optimization

Neural Information Processing Systems

Summary of the paper: This paper describes the use of a technique known as stochastic trust-region optimization in the context of variational inference (VI). In VI an objective needs to be maximized with respect to the parameters of an approximate distribution. This optimization task enforces that the approximate distribution q looks similar to the exact posterior. In complex probabilistic graphical models it is not possible to evaluate in closed form the objective. An alternative is to work with an stochastic estimate obtained by Monte Carlo.